Efficient and Effective Plagiarism Detection for Large Code Repositories
نویسندگان
چکیده
ABSTRACT: The copying of programming assignments is a widespread problem in academic institutions. Manual plagiarism detection is time-consuming, and current popular plagiarism detection systems are not scalable to large code repositories. While there are text-based plagiarism detection systems capable of handling millions of student papers, comparable systems for codebased plagiarism detection are in their infancy. In this paper, we propose and evaluate new techniques for code plagiarism detection. Using small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of JPlag.
منابع مشابه
Efficient plagiarism detection for large code repositories
Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections...
متن کاملSyntax tree fingerprinting: a foundation for source code similarity detection
Plagiarism detection and clone refactoring in software depend on one common concern: finding similar source chunks across large repositories. However, since code duplication in software is often the result of copy-paste behaviors, only minor modifications are expected between shared codes. On the contrary, in a plagiarism detection context, edits are more extensive and exact matching strategies...
متن کاملAutomatic slide assignation for language model adaptation
Online multimedia repositories are rapidly growing and imposing themselves as fundamental knowledge assets. This is particularly true in the area of education, where large repositories of video lectures are being built, making education accessible to a wide community of potential students. As with many other repositories, most lectures are not transcribed because of the lack of efficient soluti...
متن کاملOverview and Comparison of Plagiarism Detection Tools
In this paper we have done an overview of effective plagiarism detection methods that have been used for natural language text plagiarism detection, external plagiarism detection, clustering-base plagiarism detection and some methods used in code source plagiarism detection, also we have done a comparison between five of software used for textual plagiarism detection: (PlagAware, PlagScan, Chec...
متن کاملLinguistic and Statistical Traits Characterising Plagiarism
This paper investigates the problem of distinguishing between original and rewritten text materials, with focus on the application of plagiarism detection. The hypothesis is that original texts and rewritten texts exhibit significant and measurable differences, and that these can be captured through statistical and linguistic indicators. We propose and analyse a number of these indicators (incl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1981